Search results for "topic model"

showing 10 items of 23 documents

Multi-label Classification Using Stacked Hierarchical Dirichlet Processes with Reduced Sampling Complexity

2018

Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear i…

Topic modelComputational complexity theoryComputer science02 engineering and technologyLatent Dirichlet allocationDirichlet distributionsymbols.namesakeArtificial Intelligence020204 information systems0202 electrical engineering electronic engineering information engineeringMathematicsMulti-label classificationbusiness.industrySampling (statistics)Pattern recognitionHuman-Computer InteractionDirichlet processMetropolis–Hastings algorithmHardware and ArchitectureTest setsymbols020201 artificial intelligence & image processingArtificial intelligencebusinessAlgorithmSoftwareInformation SystemsGibbs sampling2017 IEEE International Conference on Big Knowledge (ICBK)
researchProduct

2021

Staying at the front line in learning research is challenging because many fields are rapidly developing. One such field is research on the temporal aspects of computer-supported collaborative learning (CSCL). To obtain an overview of these fields, systematic literature reviews can capture patterns of existing research. However, conducting systematic literature reviews is time-consuming and do not reveal future developments in the field. This study proposes a machine learning method based on topic modelling that takes articles from a systematic literature review on the temporal aspects of CSCL (49 original articles published before 2019) as a starting point to describe the most recent deve…

Cooperative learningTopic modelEducational researchSystematic reviewPoint (typography)Content analysisComputer scienceCollaborative learningData scienceField (computer science)EducationFrontline Learning Research
researchProduct

Supervised vs Unsupervised Latent DirichletAllocation: topic detection in lyrics.

2020

Topic modeling is a type of statistical modeling for discovering the abstract ``topics'' that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a fixed number of topics starting from words in each document modeled according to a Dirichlet distribution. In this work we are going to apply LDA to a set of songs from four famous Italian songwriters and split them into topics. This work studies the use of themes in lyrics using statistical analysis to detect topics. Aim of the work is to underline the main limits of the standard unsupervised LDA and to propose a supervised…

LDA Correspondence AnalysiMusic miningSettore SECS-S/01 - StatisticaTopic modeling
researchProduct

Statistically Validated Networks for assessing topic quality in LDA models

2022

Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution overwords characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be only characterized by a set of irrelevant or unchained words, being useless for the interpretation. Although many topic-quality metrics were proposed (Newman et al., 2009; Alet…

Settore SECS-S/06 -Metodi Mat. dell'Economia e d. Scienze Attuariali e Finanz.Settore SECS-S/01 - StatisticaTopic Model Topic Coherence LDA Statistically Validated Networks
researchProduct

Exploring topics in LDA models through Statistically Validated Networks: directed and undirected approaches

2022

Probabilistic topic models are machine learning tools for processing and understanding large text document collections. Among the different models in the literature, Latent Dirichlet Allocation (LDA) has turned out to be the benchmark of the topic modelling community. The key idea is to represent text documents as random mixtures over latent semantic structures called topics. Each topic follows a multinomial distribution over the vocabulary words. In order to understand the result of a topic model, researchers usually select the top-n (essential words) words with the highest probability given a topic and look for meaningful and interpretable semantic themes. This work proposes a new method …

Statistically Validated NetworkLDATopic Model
researchProduct

MEASURING TOPIC COHERENCE THROUGH STATISTICALLY VALIDATED NETWORKS

2020

Topic models arise from the need of understanding and exploring large text document collections and predicting their underlying structure. Latent Dirichlet Allocation (LDA) (Blei et al., 2003) has quickly become one of the most popular text modelling techniques. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models give no guaranty on the interpretability of their outputs. The topics learned from texts may be characterized by a set of irrelevant or unchained words. Therefore, topic models require validation of the coherence of estimated topics. However, the automatic evaluation …

Settore SECS-S/06 -Metodi Mat. dell'Economia e d. Scienze Attuariali e Finanz.topic model topic coherence LDA statistically validated networks.Settore SECS-S/01 - Statistica
researchProduct

Ranking coherence in topic models using statistically validated networks

2023

Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…

Statistically Validated NetworksTopic coherenceText MiningProbabilistic Topic modelLibrary and Information SciencesInformation SystemsJournal of Information Science
researchProduct

Establishing Video Game Genres Using Data-Driven Modeling and Product Databases

2015

Establishing genres is the first step toward analyzing games and how the genre landscape evolves over the years. We use data-driven modeling that distils genres from textual descriptions of a large collection of games. We analyze the evolution of game genres from 1979 till 2010. Our results indicate that until 1990, there have been many genres competing for dominance, but thereafter sport-racing, strategy, and action have become the most prevalent genres. Moreover, we find that games vary to a great extent as to whether they belong mostly to one genre or to a combination of several genres. We also compare the results of our data-driven model with two product databases, Metacritic and Mobyga…

Cultural StudiesTopic modelta520Game genreComputer sciencegenresvideopelitdigital gamesgenret050801 communication & media studiestext miningcomputer.software_genreData-driven0508 media and communicationsArts and Humanities (miscellaneous)quantitativeta517ta518topic modelMetacriticVideo gameta512game corpusApplied Psychologyta515ta113Databaseta213Communicationtekstinlouhinta05 social sciences050301 educationvideo gamesHuman-Computer Interactiondata-driven modelingDominance (economics)Anthropology0503 educationcomputerdigitaaliset pelitMobygamesgame genreGAMES AND CULTURE: A JOURNAL OF INTERACTIVE MEDIA
researchProduct

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

2017

Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while …

Topic modelHierarchical Dirichlet processSpeedupGibbs algorithmComputer scienceNonparametric statistics02 engineering and technology010501 environmental sciences01 natural sciencesLatent Dirichlet allocationBayes' theoremsymbols.namesakeComputingMethodologies_PATTERNRECOGNITION020204 information systems0202 electrical engineering electronic engineering information engineeringsymbolsAlgorithm0105 earth and related environmental sciencesGibbs sampling
researchProduct

Social Collaborative Viewpoint Regression with Explainable Recommendations

2017

A recommendation is called explainable if it not only predicts a numerical rating for an item, but also generates explanations for users' preferences. Most existing methods for explainable recommendation apply topic models to analyze user reviews to provide descriptions along with the recommendations they produce. So far, such methods have neglected user opinions and influences from social relations as a source of information for recommendations, even though these are known to improve the rating prediction. In this paper, we propose a latent variable model, called social collaborative viewpoint regression (sCVR), for predicting item ratings based on user opinions and social relations. To th…

ta113Topic modelInformation retrievalComputer sciencetopic modeling02 engineering and technologyRecommender systemtrusted social relationsViewpointsSocial relationRegression020204 information systemsBenchmark (surveying)0202 electrical engineering electronic engineering information engineeringuser comment analysis020201 artificial intelligence & image processingrecommender systemsTupleLatent variable modelProceedings of the Tenth ACM International Conference on Web Search and Data Mining
researchProduct